Search CORE

13 research outputs found

TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

Author: Enarvi Seppo
Kurimo Mikko
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text. The tool has been written using Python library Theano, which allows researcher to easily extend it and tune any aspect of the training process. Regardless of the flexibility, Theano is able to generate extremely fast native code that can utilize a GPU or multiple CPU cores in order to parallelize the heavy numerical computations. The tool has been evaluated in difficult Finnish and English conversational speech recognition tasks, and significant improvement was obtained over our best back-off n-gram models. The results that we obtained in the Finnish task were compared to those from existing RNNLM and RWTHLM toolkits, and found to be as good or better, while training times were an order of magnitude shorter

arXiv.org e-Print Archive

Crossref

Aaltodoc Publication Archive

Suomenkielinen puheentunnistus hammashuollon sovelluksissa

Author: Enarvi Seppo
Publication venue
Publication date: 01/01/2012
Field of study

A significant portion of the work time of dentists and nursing staff goes to writing reports and notes. This thesis studies how automatic speech recognition could ease the work load. The primary objective was to develop and evaluate an automatic speech recognition system for dental health care that records the status of patient's dentition, as dictated by a dentist. The system accepts a restricted set of spoken commands that identify a tooth or teeth and describe their condition. The status of the teeth is stored in a database. In addition to dentition status dictation, it was surveyed how well automatic speech recognition would be suited for dictating patient treatment reports. Instead of typing reports with a keyboard, a dentist could dictate them to speech recognition software that automatically transcribes them into text. The vocabulary and grammar in such a system is, in principle, unlimited. This makes it significantly harder to obtain an accurate transcription. The status commands and the report dictation language model are Finnish. Aalto University has developed an unlimited vocabulary speech recognizer that is particularly well suited for Finnish free speech recognition, but it has previously been used mainly for research purposes. In this project we experimented with adapting the recognizer to grammar-based dictation, and real end user environments. Nearly perfect recognition accuracy was obtained for dentition status dictation. Letter error rates for the report transcription task varied between 1.3 % and 17 % depending on the speaker, with no obvious explanation for so radical inter-speaker variability. Language model for report transcription was estimated using a collection of dental reports. Including a corpus of literary Finnish did not improve the results.Hammaslääkärien ja hoitohenkilökunnan työajasta huomattava osa kuluu raportointiin ja muistiinpanojen tekemiseen. Tämä lisensiaatintyö tutkii kuinka automaattinen puheentunnistus voisi helpottaa tätä työtaakkaa. Ensisijaisena tavoitteena oli kehittää automaattinen puheentunnistusjärjestelmä hammashuollon tarpeisiin, joka tallentaa potilaan hampaiston tilan hammaslääkärin sanelemana, ja arvioida järjestelmän toimivuutta. Järjestelmä hyväksyy rajoitetun joukon puhuttuja komentoja, jotka identifioivat hampaan tai hampaat ja kuvaavat niiden tilaa. Hampaiden tila tallennetaan tietokantaan. Hampaiston tilan sanelun lisäksi tutkittiin kuinka hyvin automaattinen puheentunnistus sopisi potilaiden hoitokertomusten saneluun. Näppäimistöllä kirjoittamisen sijaan hammaslääkäri voisi sanella hoitokertomukset puheentunnistusohjelmistolle, joka automaattisesti purkaisi puheen tekstimuotoon. Tämän kaltaisessa järjestelmässä sanasto ja kielioppi ovat periaatteessa rajoittamattomat, minkä takia tekstiä on huomattavasti vaikeampaa tunnistaa tarkasti. Status-komennot ja hoitokertomusten kielimalli ovat suomenkielisiä. Aalto-yliopisto on kehittänyt rajoittamattoman sanaston puheentunnistimen, joka soveltuu erityisen hyvin suomenkielisen vapaamuotoisen puheen tunnistamiseen, mutta sitä on aikaisemmin käytetty lähinnä tutkimustarkoituksiin. Tässä projektissa tutkimme tunnistimen sovittamista kielioppipohjaiseen tunnistukseen ja todellisiin käyttöympäristöihin. Hampaiston tilan sanelussa saavutettiin lähes täydellinen tunnistustarkkuus. Kirjainvirheiden osuus hoitokertomusten sanelussa vaihteli 1,3 ja 17 prosentin välillä puhujasta riippuen, ilman selvää syytä näin jyrkälle puhujien väliselle vaihtelulle. Kielimalli hoitokertomusten sanelulle laskettiin kokoelmasta hammaslääkärien kirjoittamia raportteja. Kirjakielisen aineiston sisällyttäminen ei parantanut tunnistustulosta

Aaltodoc Publication Archive

User Experiences from L2 Children Using a Speech Learning Application : Implications for Developing Speech Training Applications for Children

Author: Enarvi Seppo
Junttila Katja
Karhila Reima
Kurimo Mikko
Smolander Anna-Riikka
Uther Maria
Ylinen Sari
Publication venue
Publication date: 01/01/2018
Field of study

We investigated user experiences from 117 Finnish children aged between 8 and 12 years in a trial of an English language learning programme that used automatic speech recognition (ASR). We used measures that encompassed both affective reactions and questions tapping into the children' sense of pedagogical utility. We also tested their perception of sound quality and compared reactions of game and nongame-based versions of the application. Results showed that children expressed higher affective ratings for the game compared to nongame version of the application. Children also expressed a preference to play with a friend compared to playing alone or playing within a group. They found that assessment of their speech is useful although they did not necessarily enjoy hearing their own voices. The results are discussed in terms of the implications for user interface (UI) design in speech learning applications for children.Peer reviewe

Directory of Open Access Journals

Winchester Research Repository

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Wolverhampton Intellectual Repository and E-theses

SIAK – A Game for Foreign Language Pronunciation Learning

Author: Dhinakaran Krupakar
Enarvi Seppo
Hämäläinen Perttu
Junttila Katja
Kallio Heini
Karhila Reima
Kurimo Mikko
Nikulin Aleksander
Palomäki Kalle
Rantula Olli
Smolander Anna-Riikka
Uther Maria
Viitanen Vertti
Ylinen Sari
Publication venue: International Speech Communications Association
Publication date: 01/01/2017
Field of study

Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Kuvalähtöinen vajaalaatuisten tukkien tunnistaminen

Author: Enarvi Seppo
Publication venue
Publication date: 01/01/2006
Field of study

This thesis describes the development of a computer vision system that was installed at the Stora Enso wood handling terminal in Uimaharju. A measurement station is responsible for scaling the logs that the terminal receives, but until now grading has been entirely manual. The computer vision system substantially reduces the work load of the human grader by automatically detecting defects from log end images. The human grader will only grade the logs that the software suspects as being defective. A comprehensive survey of basic image segmentation techniques is given. In particular their application for the segmentation of color images is discussed. An explanation of is-sues related to selecting a color space for a particular purpose and a review of the most common color spaces is included. The development of the computer vision system that comprises image acquisition, segmentation, object recognition, and feature classification is described. The major merit of the thesis is the development of algorithms that localize the end of a log from a camera image, and detect if there are visible defects on the surface of the log end. Localization of the log end is based on three-dimensional tables that represent typical wood colors, and the circular shape of the log end. Defects are detected using statistical features of the log end pixel colors.Tässä diplomityössä käsitellään malleja, ohjelmia ja tietokantoja systeemibiologiassa. Tämä diplomityö koostuu kirjallisuuskatsauksesta ja toteutusosasta. Kirjallisuuskatsauksessa esitetään tarpeellinen taustatieto systeemibiologiasta ja niistä haasteista, jotka mallien esitys ja käyttö aiheuttavat. Kirjallisuusosassa pohditaan systeemibiologiaa tieteenalana ja alan erikoispiirteitä. Lisäksi mietitään, miten systeemibiologisia malleja esitetään tällä hetkellä ja miten ne pitäisi esittää. Valittu joukko työkaluja, joita voidaan käyttää mallien saamiseksi esitellään. Lisäksi tutustutaan muutamaan tietokantaan, joissa säilytetään ja jaetaan systeemibiologian malleja. Toteutusosassa käsitellään kolmea tietokoneohjelmaa, jotka on toteutettu tämän diplomityön puitteissa käytettäviksi systeemibiologisen datan siirtämiseen ja muokkaamiseen. Integrator, ohjelmisto systeemibiologian tutkimukseen, on huomion kohteena. Tämän diplomityön puitteissa Integraattoriin luotiin kolme uutta tietokoneohjelmaa. Nämä ovat Systems Biology Markup Language (SBML)-parseri, jolla voidaan tuoda systeemibiologian malleja Integrator ympäristöön, Kineettisten lakien muokkaaja, jolla voidaan muunnella malleja tehokkaasti ja käyttäjän kannalta helpommin sekä Datasettien muokkaaja simulaatioiden alkuarvojen asettamiseen ja tulosten tulkintaan. Kaksi Mitogen Activated Protein Kinase (MAPK)-kaskadi mallin simulaatiota replikoidaan käyttämällä SBML-formaatissa olevaa mallia, joka on haettu BioModels tietokannasta. Malli tuodaan Integrator ympäristöön SMBL-parserilla. Kineettisten lakien muokkaajaa käytetään mallin muuntelemiseen simulaatioiden välissä. Datasettien muokkaajaa käytetään alkuarvojen asettamiseen ja tulosten näkemiseen

Aaltodoc Publication Archive

Suomen puhekielen mallintaminen automaattista puheentunnistusta varten

Author: Enarvi Seppo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The accuracy of automatic speech recognizers has been constantly improving for decades. Aalto University has developed automatic recognition of Finnish speech and achieved very low error rates on clearly spoken standard Finnish, such as news broadcasts. Recognition of natural conversations is much more challenging. The language that is spoken in Finnish conversations also differs in many ways from standard Finnish, and its recognition requires data that has previously been unavailable. This thesis develops automatic speech recognition for conversational Finnish, starting by collection of training and evaluation data. For language modeling, large amounts of text are collected from the Internet, and filtered to match the colloquial speaking style. An evaluation set is published and used to benchmark the progress in conversational Finnish speech recognition. The thesis addresses many difficulties that arise from the fact that the vocabulary that is used in Finnish conversations is very large. Using deep neural networks for acoustic modeling and recurrent neural networks for language modeling, accuracy that is already useful in practical applications is achieved in conversational speech recognition.Automaattisen puheentunnistuksen tarkkuus on jatkuvasti parantunut viimeisten vuosikymmenien aikana. Aalto-yliopistossa on kehitetty automaattista puheentunnistusta suomen kielelle ja päästy hyvin pieniin virheprosentteihin selkeästi puhutun kirjakielen tunnistuksessa, esimerkiksi uutislähetyksistä. Luonnolliten keskustelujen tunnistaminen on paljon haastavampaa. Suomen puhekieli eroaa myös monella tavalla kirjakielestä, ja sen tunnistamiseen tarvitaan tietoaineistoa, jota ei aikaisemmin ole ollut saatavilla. Tämä väitöskirja kehittää automaattista puheentunnistusta suomen puhekielelle, alkaen opetus- ja testiaineiston keräämisestä. Kielen mallintamista varten Internetistä kerätään suuri määrä tekstiä ja aineisto suodatetaan vastaamaan puhekielen tyyliä. Testiaineisto julkaistaan ja sitä käytetään kriteerinä, kun arvioidaan suomen kielen keskustelumuotoisen puheen tunnistuksen kehitystä. Väitöskirjassa tutkitaan monia ongelmia jotka juontuvat siitä, että sanasto jota käytetään suomenkielisissä keskusteluissa on todella iso. Kun syviä neuroverkkoja käytetään akustiseen mallinnukseen ja takaisinkytkettyjä neuroverkkoja käytetään kielen mallinnukseen, saavutetaan keskustelupuheen tunnistuksessa tarkkuus joka on jo kelvollinen käytännön sovelluksiin

Aaltodoc Publication Archive

Studies on Training Text Selection for Conversational Finnish Language Modeling

Author: Enarvi Seppo
Kurimo Mikko
Publication venue
Publication date: 01/01/2013
Field of study

VK: coinCurrent ASR and MT systems do not operate on conversational Finnish, because training data for colloquial Finnish has not been available. Although speech recognition performance on literary Finnish is already quite good, those systems have very poor baseline performance in conversational speech. Text data for relevant vocabulary and language models can be collected from the Internet, but web data is very noisy and most of it is not helpful for learning good models. Finnish language is highly agglutinative, and written phonetically. Even phonetic reductions and sandhi are often written down in informal discussions. This increases vocabulary size dramatically and causes word-based selection methods to fail. Our selection method explicitly optimizes the perplexity of a subword language model on the development data, and requires only very limited amount of speech transcripts as development data. The language models have been evaluated for speech recognition using a new data set consisting of generic colloquial Finnish.Peer reviewe

Aaltodoc Publication Archive

A Novel Discriminative Method for Pruning Pronunciation Dictionary Entries

Author: Enarvi Seppo
Kurimo Mikko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

VK: coin hiitIn this paper we describe a novel discriminative method for pruning pronunciation dictionary. The algorithm removes those entries from the dictionary that affect negatively on speech recognition word error rate. The implementation is simple and requires no tunable parameters. We have carried out preliminary speech recognition experiments, pruning multiword pronunciations created by a phonetician. With the task in hand, we achieved only minimal improvements in recognition results. We are optimistic that the algorithm will prove to be useful in pruning larger dictionaries containing automatically generated pronunciations.Peer reviewe

Crossref

Aaltodoc Publication Archive

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

Author: Enarvi Seppo
Kurimo Mikko
Smit Peter
Virpioja Sami
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/09/2017
Field of study

Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words. While this is already sufficient in some applications, the out-of-vocabulary words are still limiting the usability in others. In agglutinative languages the vocabulary for conversational speech should include millions of word forms to cover the spelling variations due to colloquial pronunciations, in addition to the word compounding and inflections. Very large vocabularies are also needed, for example, when the recognition of rare proper names is important.Peer reviewe

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Character-based units for Unlimited Vocabulary Continuous Speech Recognition

Author: Enarvi Seppo
Gangireddy Siva
Kurimo Mikko
Smit Peter
Virpioja Sami
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

We study character-based language models in the state-of-the-art speech recognition framework. This approach has advantages over both word-based systems and so-called end-to-end ASR systems that do not have separate acoustic and language models. We describe the necessary modifications needed to build an effective character-based ASR system using the Kaldi toolkit and evaluate the models based on words, statistical morphs, and characters for both Finnish and Arabic. The morph-based models yield the best recognition results for both well-resourced and lower-resourced tasks, but the character-based models are close to their performance in the lower-resource tasks, outperforming the word-based models. Character-based models are especially good at predicting novel word forms that were not seen in the training data. Using character-based neural network language models is both computationally efficient and provides a larger gain compared to the morph and word-based systems.Peer reviewe

Crossref

Aaltodoc Publication Archive